Compiling method and compiling device

ABSTRACT

A computer converts a source program into intermediate code. The computer detects, based on profile information related to a memory access for accessing target data stored in a memory, a memory access that fits an access pattern corresponding to an operating condition of a hardware prefetch function for the target data. The target data is data for which a prefetch instruction is to be inserted in advance. The prefetch instruction is an instruction for transferring data stored in the memory to a cache memory. The computer computes an evaluation value for the target data based on a length of successive memory accesses that fit the access pattern, and determines, based on the evaluation value, whether to suppress insertion of a prefetch instruction for the target data. The computer updates the intermediate code based on a result of the determination, and converts the updated intermediate code into a machine language program.

CROSS-REFERENCE TO RELATED APPLICATION

This application is based upon and claims the benefit of priority of theprior Japanese Patent Application No. 2014-101654, filed on May 15,2014, the entire contents of which are incorporated herein by reference.

FIELD

The embodiment discussed herein is related to a compiling method and acompiling device.

BACKGROUND

A compiling device, for example, performs a compilation process forconverting a source program into a machine language program that isexecutable by an arithmetic processing device such as a processor andthe like. The arithmetic processing device, for example, includes adecoding unit, an arithmetic unit, and a cache memory. The decoding unitdecodes instructions included in the machine language program generatedthrough the compilation process. The arithmetic unit performs arithmeticoperations based on the decoded instructions. The cache memory isarranged between the arithmetic unit and a main memory that is a mainstorage device. The arithmetic unit, for example, performs arithmeticoperations by referring to data stored in the main memory or the cachememory. The cache memory, for example, stores data that is referred toby the arithmetic unit.

The arithmetic processing device, for example, may shorten a waitingtime for reference of data by referring to data stored in the cachememory when compared with referring to data stored in the main memory. Ahit rate of the cache memory decreases in a numerical calculationprocess that uses large-scale data such as an array and the like becausedata locality is low. In this case, the effect of shortening the waitingtime for reference of data is small since the cache memory is noteffectively used.

As a measure for remedying a decrease in the hit rate of the cachememory, for example, a prefetch is used to transfer data stored in themain memory to the cache memory in advance. Examples of a method forrealizing the prefetch include a software prefetch realized by softwareand a hardware prefetch realized by hardware.

In the software prefetch, for example, the compiling device inserts aninstruction (also referred to as a prefetch instruction hereinafter) totransfer data stored in the main memory to the cache memory in advanceinto the machine language program. In the hardware prefetch, hardwaresuch as a hardware prefetch mechanism and the like is disposed insidethe arithmetic processing device. For example, when it is determinedthat an access to a continuous memory is performed, the hardwareprefetch mechanism predicts data that is accessed next and transfers thedata stored in the main memory to the cache memory in advance.

In an arithmetic processing device that includes the hardware prefetchmechanism, for example, performance of the arithmetic processing devicemay be degraded even though the software prefetch is applied. Forexample, there may be a case where both of a prefetch by the hardwareprefetch mechanism and the prefetch by a prefetch instruction areperformed for data at the same address. That is to say, an unnecessaryprefetch instruction may be inserted into the machine language program.In this case, for example, performance degradation such as an increasein the number of instructions, a decrease in the process efficiency ofscheduling, a decrease in the transfer speed due to an increase in theamount of consumption of a bus, and the like may be caused.

For this reason, a technology has been proposed to efficiently achieve abalance between the hardware prefetch and the software prefetch and toimprove the performance of the arithmetic processing device. Forexample, a memory access instruction added with indicative information,which indicates whether the memory access instruction is a target of thehardware prefetch, is used in this type of an arithmetic processingdevice. In the compilation process, for example, a memory accessinstruction added with the indicative information is generated whenmemory access instructions for accessing a continuous memory aredetected.

For this reason, the arithmetic processing device, for example, includesa decoding unit capable of decoding a memory access instruction addedwith indicative information and an arithmetic unit capable of executingthe memory access instruction added with the indicative information.Furthermore, the arithmetic processing device includes a hardwareprefetch mechanism compliant with the memory access instruction addedwith the indicative information. For example, the hardware prefetch issuppressed when the indicative information indicates that the memoryaccess instruction is not a target of the hardware prefetch.

Related techniques are disclosed in, for example, Japanese Laid-openPatent Publication No. 2009-230374, Japanese National Publication ofInternational Patent Application No. 2011-504274, Japanese Laid-openPatent Publication No. 2010-244204, Japanese Laid-open PatentPublication No. 2006-330813, Japanese Laid-open Patent Publication No.2011-81836, Japanese Laid-open Patent Publication No. 2002-297379, andJapanese Laid-open Patent Publication No. 2001-166989.

When a software prefetch is applied to an arithmetic processing devicethat includes a hardware prefetch mechanism, for example, an unnecessaryprefetch instruction may be inserted into the machine language program.This may cause degradation of performance of the arithmetic processingdevice. The compiling method that generates a memory access instructionadded with indicative information, which indicates whether the memoryaccess instruction is a target of the hardware prefetch, has a lessversatility. For example, in the compiling method that generates amemory access instruction added with the indicative information, theperformance of the arithmetic processing device may be degraded when thedecoding unit, the arithmetic unit, and the hardware prefetch mechanismare not compliant with the memory access instruction added with theindicative information.

SUMMARY

According to an aspect of the present invention, provided is a compilingmethod. In the compiling method, a computer converts a source programinto intermediate code. The computer detects, on the basis of profileinformation related to a memory access for accessing target data storedin a memory, a memory access that fits an access pattern correspondingto an operating condition of a hardware prefetch function for the targetdata. The target data is data for which a prefetch instruction is to beinserted in advance. The prefetch instruction is an instruction fortransferring data stored in the memory to a cache memory. The computercomputes an evaluation value for the target data on the basis of alength of successive memory accesses that fit the access pattern. Thecomputer determines, on the basis of the evaluation value, whether tosuppress insertion of a prefetch instruction for the target data. Thecomputer updates the intermediate code on the basis of a result of thedetermination. The computer converts the updated intermediate code intoa machine language program.

The object and advantages of the invention will be realized and attainedby means of the elements and combinations particularly pointed out inthe claims.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory and arenot restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating a compiling device according to anembodiment;

FIG. 2 is a flowchart illustrating an example of the operation of thecompiling device;

FIG. 3 is a diagram illustrating an example of an arithmetic processingdevice that executes a machine language program generated by thecompiling device;

FIG. 4 is a diagram illustrating an example of a method for generatingprofile information;

FIG. 5 is a diagram illustrating an example of a method for applying theprofile information;

FIG. 6 is a diagram illustrating an example of a relationship between anaccess order of array data and a prefetch;

FIG. 7 is a diagram illustrating examples of the profile information andan access attribute table;

FIG. 8 is a flowchart illustrating an example of generation of theaccess attribute table;

FIG. 9 is a diagram illustrating examples of the access attribute tableand a succession length management table;

FIG. 10 is a flowchart illustrating an example of generation of thesuccession length management table; and

FIG. 11 is a diagram illustrating an exemplary hardware configuration ofthe compiling device.

DESCRIPTION OF EMBODIMENT

Hereinafter, an embodiment will be described with reference to thedrawings.

FIG. 1 illustrates a compiling device according to the presentembodiment. A compiling device 10 according to the present embodimentmay be realized only by hardware or may be realized by hardware undercontrol of software. For example, the compiling device 10 may berealized by a computer that executes a compilation program.

The compiling device 10, for example, performs a compilation process forconverting a source program into a machine language program. The machinelanguage program is executable by a processor having a hardware prefetchfunction of transferring data stored in a main memory, which is a mainstorage device, to a cache memory in advance. The compiling device 10,for example, receives a source program SPRG and profile information PINFto output a machine language program MPRG.

The profile information PINF, for example, is information related to amemory access for accessing target data. The target data is data forwhich a prefetch instruction for transferring data stored in the mainmemory to the cache memory is to be inserted in advance. A prefetchinstruction is inserted into the machine language program MPRG when asoftware prefetch is performed through the compilation process. Aprefetch instruction may be realized by an instruction for loading dataor may be realized by an instruction dedicated to a prefetch.

The target data, for example, is array data (referred to as indirectlyaccessed array data) that is accessed indirectly. An access destinationin the indirectly accessed array data is determined when the arithmeticprocessing device such as the processor and the like executes the sourceprogram. For this reason, it is difficult, by using hardware, to predictin advance an access destination of the indirectly accessed array data.Therefore, performing the software prefetch for the indirectly accessedarray data improves performance of the arithmetic processing device.

The hardware prefetch may be performed when an access destination of theindirectly accessed array data changes regularly. The hardware prefetch,for example, is performed by a hardware prefetch control device 140 ofan arithmetic processing device 100 illustrated in FIG. 3. Theperformance of the arithmetic processing device may be degraded byperforming the software prefetch when the hardware prefetch isperformed. For this reason, the compiling device 10 determines whetheran access destination of the indirectly accessed array data changesregularly on the basis of the profile information PINF and determineswhether to suppress insertion of a prefetch instruction on the basis ofthe determination result.

The target data is not limited to indirectly accessed array data. Forexample, the target data may be data that is accessed by a pointervariable. Also in the case of an access by a pointer variable, an accessdestination, for example, is determined when the arithmetic processingdevice executes the source program. Hereinafter, an irregular memoryaccess of which the access destination is determined when the arithmeticprocessing device executes the source program is also referred to as anirregular indirect memory access. In addition, data for which theirregular indirect memory access is performed is also referred to asirregularly and indirectly accessed data.

That is to say, the compiling device 10 may set the data for which theirregular indirect memory access is performed as the target data, notlimiting the target to the indirectly accessed array data and the datathat is accessed by a pointer variable. Hereinafter, the target datawill be described as the indirectly accessed array data.

The compiling device 10, for example, includes a code conversion unit20, an optimization unit 30, and a code generation unit 40. The codeconversion unit 20 is an example of a conversion unit that converts thesource program into intermediate code. The code conversion unit 20, forexample, receives the source program SPRG and converts source code ofthe source program SPRG into intermediate code. The intermediate codegenerated by the code conversion unit 20 is transferred to theoptimization unit 30.

The optimization unit 30 is an example of an optimization unit thatupdates the intermediate code. The optimization unit 30, for example,receives the intermediate code and the profile information PINF andoptimizes the intermediate code. The intermediate code optimized by theoptimization unit 30 is transferred to the code generation unit 40.Whether to insert a prefetch instruction into the machine languageprogram MPRG is determined when the intermediate code is optimized.

The optimization unit 30, for example, includes an access attributedetection unit 32, an evaluation value computation unit 34, and aprefetch instruction insertion unit 36. The access attribute detectionunit 32 is an example of a detection unit that detects, from amongmemory accesses for accessing the target data, a memory access that fitsa predetermined access pattern on the basis of the profile informationof the target data related to a memory access. The predetermined accesspattern, for example, is an access pattern corresponding to an operatingcondition of the hardware prefetch function.

For example, the access attribute detection unit 32 detects, on thebasis of the profile information PINF, a memory access (referred to as amemory access of a predetermined access pattern) that fits thepredetermined access pattern, from among pieces of data in the arraydata that is repeatedly accessed through a loop process. Then, theaccess attribute detection unit 32 generates information (also referredto as an access attribute table hereinafter) that includes an accessattribute indicating whether the memory access fits the predeterminedaccess pattern. The access attribute table generated by the accessattribute detection unit 32 is transferred to the evaluation valuecomputation unit 34.

The evaluation value computation unit 34 is an example of a computationunit that computes an evaluation value for target data on the basis ofthe length of successive memory accesses that fit the predeterminedaccess pattern. The evaluation value computation unit 34, for example,receives the access attribute table generated by the access attributedetection unit 32. Then, the evaluation value computation unit 34computes an evaluation value for target data on the basis of the accessattribute included in the access attribute table. The evaluation valuecomputation unit 34, for example, computes the length (also referred toas a succession length hereinafter) of successive memory accesses thatfit the predetermined access pattern. Then, the evaluation valuecomputation unit 34 computes an evaluation value for target data on thebasis of the succession length. The evaluation value computed by theevaluation value computation unit 34 is transferred to the prefetchinstruction insertion unit 36.

The prefetch instruction insertion unit 36 is an example of an updateunit that determines whether to suppress insertion of a prefetchinstruction for target data on the basis of the evaluation value andupdates the intermediate code on the basis of the determination result.The prefetch instruction insertion unit 36, for example, receives theevaluation value computed by the evaluation value computation unit 34.Then, the prefetch instruction insertion unit 36 determines whether tosuppress insertion of a prefetch instruction for each target data on thebasis of the evaluation value and updates the intermediate code on thebasis of the determination result.

The prefetch instruction insertion unit 36 updates the intermediatecode, for example, by inserting a prefetch instruction for target datahaving an evaluation value smaller than a predetermined threshold, andsuppressing insertion of a prefetch instruction regarding target datahaving an evaluation value greater than or equal to the threshold.Accordingly, the intermediate code, for example, is optimized. Theintermediate code optimized by the prefetch instruction insertion unit36 is transferred to the code generation unit 40.

The code generation unit 40 is an example of a code generation unit thatconverts the updated intermediate code into a machine language program.The code generation unit 40, for example, receives the intermediate codeupdated by the prefetch instruction insertion unit 36. Then, the codegeneration unit 40 converts the intermediate code updated by theprefetch instruction insertion unit 36 into the machine language programMPRG.

In this manner, the compiling device 10 may generate the machinelanguage program MPRG which is executable by a processor having ahardware prefetch function and into which insertion of unnecessaryprefetch instructions are reduced. That is to say, the compiling device10 may reduce insertion of unnecessary prefetch instructions whenprefetch instructions are inserted into the machine language programMPRG through the compilation process.

FIG. 2 illustrates an example of an operation of the compiling device 10illustrated in FIG. 1. The operation illustrated in FIG. 2 may berealized only by hardware or may be realized by hardware under controlof software. For example, a computer may read a compilation programrecorded in a storage medium to perform the operation illustrated inFIG. 2. In other words, the compilation program may cause the computerto perform the operation illustrated in FIG. 2.

In the operation illustrated in FIG. 2, the profile information PINF isgenerated before S100 is performed. The profile information PINF, forexample, may be generated by the compiling device 10 or may be generatedby a compiling device different from the compiling device 10.

In S100, the code conversion unit 20 converts a source program intointermediate code.

In S200, the access attribute detection unit 32 determines an accessattribute of target data, for which an irregular memory access isperformed, on the basis of the profile information PINF related to amemory access for accessing data. For example, the access attributedetection unit 32 detects, on the basis of the profile information PINFof target data, a memory access of a predetermined access pattern, suchthat an access for the data fits an access pattern corresponding to anoperating condition of a hardware prefetch function, from among thetarget data.

For example, the access attribute detection unit 32 detects, as thememory access of the predetermined access pattern, a memory access suchthat an amount (referred to as a destination shift amount) of differencebetween the current access destination and the previous accessdestination is smaller than two lines when a cache line of the cachememory is used as a unit of access. The access attribute detection unit32 may detect, as the memory access of the predetermined access pattern,a memory access such that the access destination for accessing datachanges in either one direction of an ascending order and a descendingorder at a certain access width. Then, the access attribute detectionunit 32 generates an access attribute table that includes an accessattribute indicating whether the memory access is a memory access of thepredetermined access pattern.

In S300, the evaluation value computation unit 34 computes an evaluationvalue for target data on the basis of the access attribute included inthe access attribute table generated in S200. The evaluation valuecomputation unit 34, for example, computes a succession length ofsuccessive memory accesses that fit the predetermined access pattern onthe basis of the access attribute included in the access attribute tablegenerated in S200. Then, the evaluation value computation unit 34computes the evaluation value for target data on the basis of thesuccession length. The evaluation value computation unit 34, forexample, computes the maximum value of the succession length in targetdata as the evaluation value.

The evaluation value computation unit 34, for example, may compute theevaluation value on the basis of the frequency (occurrence frequency) ofoccurrence of successive memory accesses for each succession length. Theevaluation value computation unit 34, for example, may perform amultiply-accumulate operation on the occurrence frequency of successivememory accesses for each succession length and a weighting coefficientcorresponding to the size of the succession length and compute theresult of the multiply-accumulate operation as the evaluation value. Theweighting coefficient may be set to be rewritable by a user or may be afixed value.

In S400, the prefetch instruction insertion unit 36 determines whetherthe evaluation value computed in S300 is smaller than a predeterminedthreshold for each target data. The threshold, for example, is a valuethat is approximately half the number of data in the target data. Whenthe evaluation value is smaller than the threshold (Yes in S400), theoperation of the compiling device 10 proceeds to S500. That is to say,S500 is performed for target data having an evaluation value smallerthan the threshold. When the evaluation value is greater than or equalto the threshold (No in S400), the operation of the compiling device 10proceeds to S600. That is to say, S600 is performed for target datahaving an evaluation value greater than or equal to the threshold.

The determination in S400 corresponds to determination of whetherperforming the software prefetch causes performance degradation of thearithmetic processing device. The compiling device 10, for example,determines that the performance of the arithmetic processing device isimproved in a case where the software prefetch is performed for targetdata having an evaluation value smaller than the threshold when comparedwith a case where the software prefetch is not performed. In otherwords, the compiling device 10 determines that the performance of thearithmetic processing device is degraded in a case where the softwareprefetch is performed for target data having an evaluation value greaterthan or equal to the threshold when compared with a case where thesoftware prefetch is not performed.

In S500, the prefetch instruction insertion unit 36, for example,generates intermediate code (referred to as intermediate code for theprefetch instruction) for performing the prefetch instruction for targetdata having an evaluation value smaller than the threshold.

In this manner, the prefetch instruction insertion unit 36 prepares forinsertion of a prefetch instruction for target data for which theperformance of the arithmetic processing device is determined to beimproved by performing the software prefetch.

In S600, the prefetch instruction insertion unit 36, for example, doesnot generate intermediate code (intermediate code for the prefetchinstruction) for performing the prefetch instruction for target datahaving an evaluation value greater than or equal to the threshold.Alternatively, the prefetch instruction insertion unit 36 may generateintermediate code of “NOP” (intermediate code for no operation). When adirection of outputting a prefetch instruction for target data, that is,an instruction to insert a prefetch instruction into the machinelanguage program, is set in advance, the prefetch instruction insertionunit 36 may generate intermediate code for deterring the direction fortarget data having an evaluation value greater than or equal to thethreshold.

In this manner, the prefetch instruction insertion unit 36 prepares forsuppression of insertion of a prefetch instruction for target data forwhich the performance of the arithmetic processing device is determinedto be degraded by performing the software prefetch.

In S700, the prefetch instruction insertion unit 36 inserts the prefetchinstruction generated in S500 or S600 into the intermediate codegenerated in S100 to update the intermediate code generated in S100. Inthis manner, the prefetch instruction insertion unit 36 optimizes theintermediate code generated in S100. That is to say, the prefetchinstruction insertion unit 36 updates the intermediate code on the basisof the determination result (the determination result in S400) ofwhether to suppress insertion of a prefetch instruction for target data.

In S800, the code generation unit 40 converts the intermediate codeupdated in S700 into the machine language program MPRG. Accordingly, themachine language program MPRG that is executable by the arithmeticprocessing device is generated. Insertion of an unnecessary prefetchinstruction is suppressed, and a prefetch instruction that contributesto improving the performance of the arithmetic processing device isinserted into the machine language program MPRG generated in S800. Thatis to say, the compiling device 10 may reduce insertion of anunnecessary prefetch instruction when prefetch instructions are insertedinto the machine language program MPRG through the compilation process.

The operation of the compiling device 10 is not limited to this example.For example, the compiling device 10 may suppress insertion of aprefetch instruction without performing determination based on theprofile information PINF when the access pattern of target data isdetermined to be in an ascending order or a descending order through astatic analysis at the time of compiling the source program SPRG.

FIG. 3 illustrates an example of the arithmetic processing device 100that executes the machine language program generated by the compilingdevice 10 illustrated in FIG. 1. The arithmetic processing device 100,for example, is a processor such as a central processing unit (CPU) andthe like. The arithmetic processing device 100, for example, reads datastored in a main memory 150 to execute the machine language program MPRGgenerated by the compiling device 10. In addition, the arithmeticprocessing device 100 stores an execution result (operation result) ofthe machine language program MPRG in the main memory 150.

The arithmetic processing device 100, for example, includes aninstruction decoder 110, an arithmetic unit 120, a cache memory 130, andthe hardware prefetch control device 140. The instruction decoder 110decodes an instruction indicated by the machine language program MPRG.Then, the instruction decoder 110 outputs the decoded instruction to thearithmetic unit 120.

The arithmetic unit 120 performs an operation based on the decodedinstruction. For example, the arithmetic unit 120 performs an operationwith reference to data stored in the cache memory 130. In addition, thearithmetic unit 120, for example, stores the operation result in thecache memory 130.

The cache memory 130 stores data received from the main memory 150. Forexample, the cache memory 130 stores data referred to by the arithmeticunit 120. Furthermore, the cache memory 130 stores data that istransferred from the main memory 150 by the hardware prefetch or thesoftware prefetch. In addition, the cache memory 130 outputs data suchas the operation result and the like to the main memory 150.

Given that, for example, CL is a cache line number of the cache memory130, LS is the size (byte) of a line, WS is the capacity (kilobyte) perone way, and addr is a logical address, the cache line number CL isrepresented by Equation (1). The logical address addr, for example, isan address in the main memory 150.CL=mod(addr/LS,(WS×1024)/LS)  (1)

In Equation (1), mod is a modulo operator. That is to say, the cacheline number CL corresponds to the remainder of division of “addr/LS” by“(WS×1024)/LS”.

The hardware prefetch control device 140 controls whether to perform thehardware prefetch. For example, the hardware prefetch control device 140performs the hardware prefetch when detecting a series of cache missesto occur during an access such that the access destination changesregularly.

The hardware prefetch control device 140, for example, performs thehardware prefetch when detecting cache misses on consecutive cache linesduring a continuous area access. The continuous area access, forexample, is an access such that the destination shift amount is smallerthan two lines, that is, the destination shift amount is within a rangein which the cache lines are consecutive, when a cache line of the cachememory 130 is used as a unit of access.

The condition (operating condition) of performing the hardware prefetchis not limited to this example. For example, the hardware prefetchcontrol device 140 may perform the hardware prefetch when detecting aseries of cache misses to occur during an access such that the accessdestination changes in either one direction of an ascending order and adescending order at a certain access width.

FIG. 4 illustrates an example of a method for generating the profileinformation PINF. In the source program SPRG illustrated in FIG. 4, forexample, an operation of adding a value indicated by array dataA(LIST(i)) to Y is repeated n times. The array data A(LIST(i)) is alsoreferred to as array data A hereinafter. The access destination of thearray data A, for example, is specified by array data LIST(i) and isdetermined when the source program SPRG is executed by the arithmeticprocessing device 100. Therefore, an access to the array data A is anirregular indirect memory access. That is to say, the array data A istarget data for which a prefetch instruction is to be inserted. In thefollowing description, the array data A is assumed to be target data.

First, in P10, a compilation option of generating the profileinformation is specified to a compiler, and the source program SPRG as acompilation target is compiled into a machine language program MPRG10.The compiler, for example, corresponds to a compilation program forrealizing the operation of the compiling device 10.

The machine language program MPRG10 generated in P10, for example, is aprogram in object format. The machine language program MPRG10, forexample, includes object code for generating the profile information andobject code that is generated by the compilation of the source programSPRG. The object code for generating the profile information, forexample, is object code for outputting the profile information PINF ofdata for which a prefetch instruction is to be inserted when the sourceprogram SPRG is executed.

Since the compilation option of generating the profile information isspecified, the compiling device 10 generates, for example, whencompiling the source program SPRG, object code for outputting theprofile information of the array data A when the source program SPRG isexecuted.

In P20, various libraries LIB such as a profile library and the like arelinked to the machine language program MPRG10 and a machine languageprogram MPRG12 is generated. The machine language program MPRG12 is aprogram in executable format. The linking in P20, for example, isperformed by using a linker. The linker, for example, is a program forlinking the program MPRG10 in object format to the library LIB togenerate the program MPRG12 in executable format. The linker may beincluded in the compilation program for realizing the operation of thecompiling device 10 or may be a program separated from the compilationprogram.

In P30, the machine language program MPRG12 is executed. Accordingly,the profile information PINF is generated. The arithmetic processingdevice 100, for example, executes the machine language program MPRG12 tooutput the profile information PINF of the array data A to an externalunit.

In this manner, the compiling device 10 generates object code forcausing the arithmetic processing device 100 which executes the sourceprogram SPRG to output the profile information PINF for each targetdata. The profile information PINF output for each target data by thearithmetic processing device 100, for example, includes an index andaddress information. The index indicates the access order of each pieceof data in the target data, and the address information indicates theaccess destination. The method for generating the profile informationPINF is not limited to this example.

FIG. 5 illustrates an example of a method for applying the profileinformation PINF. The source program SPRG is the same as or similar tothe source program SPRG described in FIG. 4. The profile informationPINF here, for example, is the profile information PINF generated by theoperation illustrated in FIG. 4. Detailed description of the operationthat is the same as or similar to the operation illustrated in FIG. 4will be omitted.

In P12, for example, a compilation option of applying the profileinformation is specified to the compiler, and the source program SPRG asa compilation target is compiled into a machine language program MPRG20.At this time, since the compilation option of applying the profileinformation is specified, the compiling device 10, for example,determines whether to suppress insertion of a prefetch instruction forirregularly and indirectly accessed data (for example, the array data A)on the basis of the profile information PINF.

Upon detecting irregularly and indirectly accessed data (for example,the array data A), the compiling device 10, for example, determines thatthe detected data is target data for which a prefetch instruction is tobe inserted. Then, the compiling device 10 determines whether tosuppress insertion of a prefetch instruction for the target data(irregularly and indirectly accessed data) on the basis of the profileinformation PINF. That is to say, the compiling device 10 determineswhether to insert a prefetch instruction for irregularly and indirectlyaccessed data (for example, the array data A) into the machine languageprogram MPRG20 on the basis of the profile information PINF.

Therefore, when the compiling device 10 determines to suppress insertionof a prefetch instruction for the array data A, the compiling device 10,for example, does not insert a prefetch instruction for the array data Ainto the machine language program MPRG20. When the compiling device 10determines not to suppress insertion of a prefetch instruction for thearray data A, the compiling device 10, for example, inserts a prefetchinstruction for the array data A into the machine language programMPRG20. In this manner, the determination result of whether to suppressinsertion of a prefetch instruction is reflected on the machine languageprogram MPRG20 generated in P12. The machine language program MPRG20,for example, is a program in object format.

In P22, for example, various libraries LIB are linked to the machinelanguage program MPRG20 and a machine language program MPRG22 isgenerated. The machine language program MPRG22 is a program inexecutable format. The linking in P22, for example, is performed byusing the linker. In P32, the machine language program MPRG22 isexecuted.

The method for applying the profile information PINF is not limited tothis example. For example, the target data may be specified by anoptimization control line, a pragma, or the like, for giving aninstruction to the compiler. The optimization control line, for example,is used in FORTRAN. The pragma, for example, is used in C language andthe like.

A user, for example, may write an optimization control line (forexample, “!ocl Prefetch_index(auto, LIST)”) for specifying the arraydata A(LIST(i)) as target data for which a prefetch instruction is to beinserted in the source program SPRG. In this case, the compiling device10 determines that the array data A(LIST(i)) is target data. Therefore,data that is not specified by the optimization control line is notdetermined to be target data even if the data is irregularly andindirectly accessed data. In P10 illustrated in FIG. 4, the compilingdevice 10 may generate object code for generating the profileinformation PINF only for data specified by the optimization controlline when target data is specified by the optimization control line.

FIG. 6 is a diagram illustrating an example of a relationship betweenthe access order of the array data A(LIST(i)) and the prefetch. Thesource program SPRG is the same as or similar to the source program SPRGdescribed in FIG. 4. The array data A(LIST(i)) is target data for whicha prefetch instruction is to be inserted. The sign i in FIG. 6corresponds to the access order of the array data A. The value of thearray data LIST(i) in FIG. 6 corresponds to the access destination (theaddress in the memory area) of the array data A.

In case 1, the respective values of the array data LIST(1), the arraydata LIST(2), and the array data LIST(3) are 555, 11, and 2000.Therefore, the array data A(555), the array data A(11), and the arraydata A(2000) are accessed in this order. In this manner, the accessdestination of the array data A irregularly changes in order of theaddresses 555, 11, and 2000 in the memory area.

For this reason, the hardware prefetch is not performed for the arraydata A. The software prefetch is performed for the array data A.Therefore, a prefetch instruction is inserted for the array data A.“prefetch A(LIST(i+α))” in FIG. 6 indicates that a value of the arraydata A to be accessed α accesses later is pre-fetched. Accordingly, theprefetch is performed even for irregularly and indirectly accessed data.As a result, according to the present embodiment, the waiting time forreferring to data may be shortened.

In case 2, the respective values of the array data LIST(1), the arraydata LIST(2), the array data LIST(3), and the array data LIST(4) areone, two, three, and four. Therefore, the array data A(1), the arraydata A(2), the array data A(3), and the array data A(4) are accessed inthis order. In this manner, the access destination of the array data Asequentially changes in order of the addresses 1, 2, 3, and 4 in thememory area.

For this reason, the hardware prefetch is performed for the array dataA. Thus, according to the present embodiment, the waiting time forreferring to data may be shortened. Since the access destination of thearray data A changes in accordance with a predetermined rulecorresponding to the operating condition of the hardware prefetch, thesoftware prefetch is not performed for the array data A. That is to say,according to the present embodiment, the situation in which both of thehardware prefetch and the software prefetch are performed concurrentlymay be suppressed when the access destination of target data changesregularly.

The compiling device 10, for example, suppresses insertion of a prefetchinstruction for the array data A. Thus, according to the presentembodiment, insertion of an unnecessary prefetch instruction may besuppressed when the access destination of target data changes regularly.In this manner, according to the present embodiment, insertion of anunnecessary prefetch instruction may be reduced when prefetchinstructions are inserted into the machine language program through thecompilation process. As a result, according to the present embodiment,the performance degradation of the arithmetic processing device 100 dueto insertion of an unnecessary prefetch instruction may be reduced.

Determination of whether to suppress insertion of a prefetch instructionis based on an evaluation value as described in FIG. 2. The evaluationvalue is generated, for example, based on a succession length managementtable TABL2 illustrated in FIG. 9. The succession length managementtable TABL2 is generated, for example, based on an access attributetable TABL1 illustrated in FIG. 7. The access attribute table TABL1 isgenerated based on the profile information PINF.

FIG. 7 illustrates examples of the profile information PINF and theaccess attribute table TABL1. The profile information PINF and theaccess attribute table TABL1, for example, are generated for each unitof insertion of a prefetch instruction. For example, the profileinformation PINF and the access attribute table TABL1 are generated foreach loop process for target data.

The profile information PINF of the array data A, for example, includesan element number i and the cache line number CL. The element number iindicates the access order, and the cache line number CL indicates theaccess destination. In the example in FIG. 7, the array size n of thearray data A is 16. The cache line number CL, for example, is theaddress information that indicates the access destination. The cacheline number CL indicates the cache line of the cache memory 130 whereeach piece of data of the array data A is stored. Computation of thecache line number CL, for example, is performed by using Equation (1),based on the logical address addr (for example, the address in the mainmemory 150) of each piece of data of the array data A.

According to the profile information PINF illustrated in FIG. 7, therespective cache line numbers CL, where pieces of data that correspondto the element numbers 1, 2, 3, and 4 in the array data A are stored,are 0, 1, 2, and 3. The respective cache line numbers CL, where piecesof data that correspond to the element numbers 5, 6, 7, and 8 in thearray data A are stored, are 22, 33, 44, and 45. The respective cacheline numbers CL, where pieces of data that correspond to the elementnumbers 9, 10, 11, and 12 in the array data A are stored, are 55, 56,66, and 67. The respective cache line numbers CL, where pieces of datathat correspond to the element numbers 13, 14, 15, and 16 in the arraydata A are stored, are 68, 69, 70, and 88.

The access attribute table TABL1 which is generated based on the profileinformation PINF of the array data A, for example, includes the elementnumber i, the cache line number CL, and an access attribute Z. That isto say, the access attribute table TABL1 is generated by adding theaccess attribute Z to the profile information PINF.

Information “S” in a cell of the access attribute Z in FIG. 7 indicatesthat the access destination changes in accordance with the predeterminedrule corresponding to the operating condition of the hardware prefetchfunction. Information “X” indicates that the access destination changesregardless of the predetermined rule. The access attributes Z where theinformation “S” and the information “X” are respectively set are alsoreferred to as access attributes “S” and “X” hereinafter. The accessattribute “S”, for example, indicates that the destination shift amountis smaller than two lines, that is, the destination shift amount iswithin the range in which the cache lines are consecutive. The accessattribute “X” indicates that the change of the access destination is notconsecutive, that is, a change between inconsecutive cache lines.

The access attribute is not set for the element number 1 (i=1) of theaccess attribute table TABL1 because the element number 1 is initiallyaccessed. The access attribute Z for the element numbers 2, 3, 4, 8, 10,12, 13, 14, and 15 is set to the information “S” because the differencebetween the cache line number CL of these element numbers and the cacheline number CL of the previous element numbers (i−1) thereof is one line(smaller than two lines). The differences between the cache line numbersCL corresponding to the element numbers 5, 6, 7, 9, 11, and 16 and thecache line numbers CL of the previous element numbers thereof (i−1) arerespectively 19, 11, 11, 10, 10, and 18 and are greater than or equal totwo lines (exceeds one line). For this reason, the access attribute Zfor the element numbers 5, 6, 7, 9, 11, and 16 is set to the information“X”.

The configuration of the profile information PINF and the accessattribute table TABL1 is not limited to this example. For example, inaddition to the element number i and the cache line number CL, theprofile information PINF may include the address information indicatingthe logical address addr of each piece of data of the array data A.Alternatively, the profile information PINF may include the addressinformation indicating the logical address addr instead of the cacheline number CL.

The profile information PINF may include the access attribute Z. In thiscase, the profile information PINF may be used as the access attributetable TABL1. The cache line number CL of the access attribute tableTABL1 may be omitted.

FIG. 8 illustrates an example of generation of the access attributetable. The operation illustrated in FIG. 8 may be realized only byhardware or may be realized by hardware under control of software. Inthe operation illustrated in FIG. 8, it is assumed that thepredetermined access pattern corresponding to the operating condition ofthe hardware prefetch function is an access pattern in which thedestination shift amount is smaller than two lines. The operationillustrated in FIG. 8 corresponds to S200 illustrated in FIG. 2.

In FIG. 8, the cache line number CL corresponding to the element numberi is also referred to as a cache line number CL(i). Data that isaccessed at the i-th time in the array data A, for example, is stored inthe cache line of the cache line number CL(i).

In S210, the access attribute detection unit 32 sets the element numberi to 1. In addition, the access attribute detection unit 32 obtains thecache line number CL(i) corresponding to the element number i from theprofile information PINF. For example, when the element number i is one,the access attribute detection unit 32 reads, from the profileinformation PINF, the cache line number CL(1) corresponding to theelement number 1 in the profile information PINF.

When the profile information PINF includes the logical address addrinstead of the cache line number CL, the access attribute detection unit32 computes the cache line number CL(i) corresponding to the elementnumber i by using Equation (1).

In S212, the access attribute detection unit 32 increments the elementnumber i. The access attribute detection unit 32, for example, adds oneto the value of the element number i to update the element number i.

In S214, the access attribute detection unit 32 obtains the cache linenumber CL(i) corresponding to the element number i from the profileinformation PINF. For example, the access attribute detection unit 32reads, from the profile information PINF, the cache line number CL(i)corresponding to the element i in the profile information PINF.

In S216, the access attribute detection unit 32 determines whether theabsolute value of the difference between the cache line number CL(i) andthe cache line number CL(i−1) is smaller than two. When the elementnumber i is two, the cache line number CL(i−1) has been obtained inS210, for example. When the element number i is greater than or equal tothree, the cache line number CL(i−1) has been obtained in the previousoperation of S214.

When the absolute value of the difference between the cache line numberCL(i) and the cache line number CL(i−1) is smaller than two (Yes inS216), the operation of the access attribute detection unit 32 proceedsto S218. That is to say, the operation of the access attribute detectionunit 32 proceeds to S218 when the access pattern, which is predictedbased on the cache line numbers CL(i) and CL(i−1), satisfies thepredetermined rule corresponding to the operating condition of thehardware prefetch function. In other words, the operation of the accessattribute detection unit 32 proceeds to S218 when the access destinationof the array data A changes in accordance with the predetermined rulecorresponding to the operating condition of the hardware prefetchfunction.

When the absolute value of the difference between the cache line numberCL(i) and the cache line number CL(i−1) is greater than or equal to two(No in S216), the operation of the access attribute detection unit 32proceeds to S220. That is to say, the operation of the access attributedetection unit 32 proceeds to S220 when the access pattern, which ispredicted based on the cache line numbers CL(i) and CL(i−1), does notsatisfy the predetermined rule corresponding to the operating conditionof the hardware prefetch function. In other words, the operation of theaccess attribute detection unit 32 proceeds to S220 when the accessdestination of the array data A changes regardless of the predeterminedrule corresponding to the operating condition of the hardware prefetchfunction.

In S218, the access attribute detection unit 32 stores the information“S”, which indicates that the access destination changes in accordancewith the predetermined rule corresponding to the operating condition ofthe hardware prefetch function, in the access attribute Z correspondingto the element number i. The access attribute detection unit 32, forexample, sets the information “S”, which indicates a consecutive access,for the access attribute Z corresponding to the element number i.

In S220, the access attribute detection unit 32 stores the information“X”, which indicates that the access destination changes regardless ofthe predetermined rule corresponding to the operating condition of thehardware prefetch function, in the access attribute Z corresponding tothe element number i. The access attribute detection unit 32, forexample, sets the information “X”, which indicates an inconsecutiveaccess, for the access attribute Z corresponding to the element numberi.

In S222, the access attribute detection unit 32 determines whether theelement number i has the same value as the array size n of the arraydata A. When the element number i does not have the same value as thearray size n of the array data A (No in S222), the operation of theaccess attribute detection unit 32 returns to S212.

When the element number i has the same value as the array size n of thearray data A (Yes in S222), the operation of the access attributedetection unit 32 ends. Accordingly, for example, the access attributetable TABL1 illustrated in FIG. 7 is generated.

The operation of the access attribute detection unit 32 is not limitedto this example. For example, in S216, the access attribute detectionunit 32 may determine whether the difference between the cache linenumber CL(i) and the cache line number CL(i−1) is equal to apredetermined line width. That is to say, the access attribute detectionunit 32 may determine whether the access destination, when a cache lineof the cache memory 130 is used as a unit of access, changes in eitherone direction of an ascending order and a descending order at a certainline width. For example, S218 is performed when the difference betweenthe cache line number CL(i) and the cache line number CL(i−1) is equalto the predetermined line width.

Alternatively, in S216, the access attribute detection unit 32 maydetermine whether the access destination indicated by the logicaladdress changes in either one direction of an ascending order and adescending order at a certain access width. That is to say, the accessattribute detection unit 32 may detect data of which the accessdestination changes in either one direction of an ascending order and adescending order at a certain access width. For example, S218 isperformed for data of which the access destination changes in either onedirection of an ascending order and a descending order at a certainaccess width.

FIG. 9 illustrates examples of the access attribute table TABL1 and thesuccession length management table TABL2. The succession lengthmanagement table TABL2, for example, is generated for each unit ofinsertion of a prefetch instruction. The succession length managementtable TABL2, for example, is generated for each loop process for targetdata for which a prefetch instruction is to be inserted.

The succession length management table TABL2 of the array data A, forexample, includes information that indicates the length (successionlength L) of successive memory accesses that fit the predeterminedaccess pattern and information that indicates the occurrence frequency cfor each succession length L. The maximum value of the succession lengthL, for example, is a value obtained by subtracting one from the arraysize n of the array data A. The succession length management table TABL2illustrated in FIG. 9 is an example of a succession length managementtable TABL2 when the array size n of the array data A is 16.

In the access attribute table TABL1, there is one instance of the partwhere the access attribute “S” occurs three times consecutively(succession length L=3), two instances of the part where the accessattribute “S” occurs once (succession length L=1), and one instance ofthe part where the access attribute “S” occurs four times consecutively(succession length L=4). The number of the access attributes “X” is six.

Therefore, the occurrence frequency c of the successive memory accesseswith the succession length L of zero is set to six in the successionlength management table TABL2. The occurrence frequency c of thesuccessive memory accesses with the succession length L of one is set totwo. The occurrence frequency c of the successive memory accesses withthe succession length L of three is set to one, and the occurrencefrequency c of the successive memory accesses with the succession lengthL of four is set to one. The occurrence frequency c of the successivememory accesses with the succession length L of each of two, five, . . ., and 15 is set to zero.

The succession length management table TABL2, for example, is referredto when the evaluation value for the array data A (target data) iscomputed. The evaluation value computation unit 34, for example, maycompute the evaluation value (Va in Equation (2)) that is represented byEquation (2).Va=Σ(L(i)×c(i))  (2)

L(i) and c(i) in Equation (2) respectively indicate the i-th successionlength L and the i-th occurrence frequency c in the succession lengthmanagement table TABL2. i in Equation (2) is an integer ranging from oneto m when the number of elements in the succession length managementtable TABL2 is m (16 in the example in FIG. 9). The evaluation value Varepresented by Equation (2) is an example of an evaluation value whichis computed based on the succession length L and the occurrencefrequency c.

The configuration of the succession length management table TABL2 is notlimited to this example. For example, the maximum value of thesuccession length L in the succession length management table TABL2 maybe smaller than the value obtained by subtracting one from the arraysize n of the array data A. When, for example, the array size n of thearray data A is 16, the maximum value of the succession length L may beapproximately half the array size n (eight in the example in FIG. 9). Inthis case, the number of the parts where the access attribute “S”consecutively occurs a greater number of times than or equal to themaximum value (for example, eight) of the succession length L is set asthe occurrence frequency c of the successive memory accesses with thesuccession length L equal to the maximum value.

The succession length management table TABL2 may include a weightingcoefficient corresponding to the size of the succession length L. Inthis case, the evaluation value computation unit 34, for example, maycompute the evaluation value Va that is represented by Equation (3). Themeaning of i and c(i) in Equation (3) is the same as or similar to thatin Equation (2). w(i) in Equation (3) indicates the weightingcoefficient of the succession length L(i).Va=Σ(c(i)×w(i))  (3)

The weighting coefficient w(i) may be a fixed value or may be a valuethat is arbitrarily set by a user. The evaluation value Va representedby Equation (3) is an example of an evaluation value which is computedbased on the succession length L and the occurrence frequency c becausethe weighting coefficient w(i) is set for each succession length L(i).

The compiling device 10, for example, may accurately determine thefrequency of performance of the hardware prefetch for the array data Aby using the evaluation value which is computed based on the successionlength L and the occurrence frequency c when compared with a case ofusing the evaluation which is computed based only on the successionlength L.

FIG. 10 illustrates an example of generation of the succession lengthmanagement table. The operation illustrated in FIG. 10 may be realizedonly by hardware or may be realized by hardware under control ofsoftware. In the operation illustrated in FIG. 10, for example, it isassumed that the maximum value of the succession length L in thesuccession length management table TABL2 is the value obtained bysubtracting one from the array size n of the array data A. The operationillustrated in FIG. 10 corresponds to the operation of a part of S300illustrated in FIG. 2. S310 to S324 are performed by the evaluationvalue computation unit 34, for example. S310 to S324 may be performed bya module (for example, the access attribute detection unit 32) otherthan the evaluation value computation unit 34.

The occurrence frequency c(L) in FIG. 10 indicates the occurrencefrequency c of the successive memory accesses with the succession lengthL. In the succession length management table TABL2 illustrated in FIG.9, for example, the occurrence frequency c(0) indicates the occurrencefrequency c (six in FIG. 9) of the successive memory accesses with thesuccession length L of zero, and the occurrence frequency c(1) indicatesthe occurrence frequency c (two in FIG. 9) of the successive memoryaccesses with the succession length L of one. The element number i inFIG. 10 indicates the element number i in the access attribute tableTABL1.

In S310, the evaluation value computation unit 34 sets initial data. Theevaluation value computation unit 34, for example, sets the elementnumber i in the access attribute table TABL1 to two and sets a flagBflag, the succession length L, and the occurrence frequency c(j) tozero. j is an integer ranging from zero to n−1. The flag Bflag is a flagfor holding the access attribute Z of the previous element.

In S312, the evaluation value computation unit 34 determines whether theaccess attribute Z(i) corresponding to the element number i in theaccess attribute table TABL1 is equal to the access attribute “S” thatindicates a consecutive access. When the access attribute Z(i) is equalto the access attribute “S” (Yes in S312), the operation of theevaluation value computation unit 34 proceeds to S314. When the accessattribute Z(i) is not equal to the access attribute “S” (No in S312),the operation of the evaluation value computation unit 34 proceeds toS320. That is to say, the operation of the evaluation value computationunit 34 proceeds to S320 when the access attribute Z(i) is equal to theaccess attribute “X” that indicates an inconsecutive access.

In S314, the evaluation value computation unit 34 determines whether theflag Bflag is equal to one. That is to say, the evaluation valuecomputation unit 34 determines whether the access attribute Z(i−1) ofthe previous element is equal to the access attribute “S”. When the flagBflag is equal to one (Yes in S314), the operation of the evaluationvalue computation unit 34 proceeds to S316. That is to say, theoperation of the evaluation value computation unit 34 proceeds to S316when the access attribute Z(i−1) of the previous element is equal to theaccess attribute “S”.

When the flag Bflag is not equal to one (No in S314), the operation ofthe evaluation value computation unit 34 proceeds to S318. That is tosay, the operation of the evaluation value computation unit 34 proceedsto S318 when the access attribute Z(i−1) of the previous element isequal to the access attribute “X”.

In S316, the evaluation value computation unit 34 increments the valueof the succession length L. The evaluation value computation unit 34,for example, adds one to the value of the succession length L to updatethe succession length L. That is to say, the evaluation valuecomputation unit 34 increments the value of the succession length L whenboth the access attribute Z(i) and the access attribute Z(i−1) are equalto the access attribute “S” (when a consecutive access continues).

In S318, the evaluation value computation unit 34 sets the value of thesuccession length L and the flag Bflag to one. That is to say, theevaluation value computation unit 34 sets the value of the successionlength L and the flag Bflag to one when the access attribute Z(i) andthe access attribute Z(i−1) are respectively equal to the accessattributes “S” and “X” (when a consecutive access starts).

In S320, the evaluation value computation unit 34 increments theoccurrence frequency c(L). The evaluation value computation unit 34, forexample, adds one to the occurrence frequency c(L) to update theoccurrence frequency c(L). Accordingly, the occurrence frequency of thesuccessive memory accesses with the succession length L is incremented.After updating the occurrence frequency c(L), the evaluation valuecomputation unit 34 sets the value of the succession length L and theflag Bflag to zero.

That is to say, the evaluation value computation unit 34 initializes thevalue of the succession length L and the flag Bflag to zero afterincrementing the occurrence frequency c(L) when the access attributeZ(i) is equal to the access attribute “X” (in a case of an inconsecutiveaccess).

In S322, the evaluation value computation unit 34 increments the elementnumber i. The evaluation value computation unit 34, for example, addsone to the value of the element number i to update the element number i.

In S324, the evaluation value computation unit 34 determines whether theelement number i has the same value as the array size n of the arraydata A. When the element number i does not have the same value as thearray size n of the array data A (No in S324), the operation of theevaluation value computation unit 34 returns to S312.

When the element number i has the same value as the array size n of thearray data A (Yes in S324), the evaluation value computation unit 34ends the operation for generation of the succession length managementtable. Accordingly, for example, the succession length management tableTABL2 illustrated in FIG. 9 is generated. The evaluation valuecomputation unit 34, for example, computes an evaluation value withreference to the generated succession length management table TABL2.

The operation of the evaluation value computation unit 34 is not limitedto this example. For example, in S316, the evaluation value computationunit 34 updates the succession length L such that the succession lengthL does not exceed the maximum value when the maximum value of thesuccession length L in the succession length management table TABL2 issmaller than n−1. The evaluation value computation unit 34, for example,determines whether the succession length L is equal to the maximum valuebefore performing S316. When the succession length L is equal to themaximum value, the evaluation value computation unit 34 does not performS316 but proceeds to S322.

FIG. 11 illustrates an exemplary hardware configuration of the compilingdevice 10 illustrated in FIG. 1. Similar reference numerals are given tothe components that are similar to the components illustrated in FIG. 1to FIG. 10, and detailed description thereof will be omitted.

A computer apparatus 200 includes a processor 210, a memory 220, a harddisk device 230, a peripheral interface 240, an output unit 250, aninput unit 260, and an optical drive device 270. The processor 210, thememory 220, the hard disk device 230, the peripheral interface 240, theoutput unit 250, the input unit 260, and the optical drive device 270are connected to one another through a bus 280.

The computer apparatus 200, for example, is connectable to thearithmetic processing device 100 through the peripheral interface 240.The optical drive device 270 is capable of having a removable disk 290such as an optical disk mounted therein, reads information recorded onthe mounted removable disk 290, and records information on the mountedremovable disk 290. The processor 210, the memory 220, the hard diskdevice 230, the peripheral interface 240, the output unit 250, and theinput unit 260 realize the functions of the compiling device 10, forexample.

The output unit 250, for example, is a unit that outputs the result of aprocess performed by the processor 210. A display device, a touch panel,or the like is an example of the output unit 250. The input unit 260,for example, is a unit that receives user manipulation (for example,specifying a compilation option) and inputs data into the computerapparatus 200. A keyboard, a mouse, a touch panel, or the like is anexample of the input unit 260.

The memory 220, for example, stores an operating system of the computerapparatus 200. In addition, for example, the memory 220 stores anapplication program such as the compilation program in order for theprocessor 210 to perform the operations illustrated in FIG. 2, FIG. 4,FIG. 5, FIG. 8, FIG. 10, and the like. The application program such asthe compilation program may be stored in the hard disk device 230.

The application program such as the compilation program, for example,may be distributed by being recorded on the removable disk 290 such asan optical disk. The computer apparatus 200, for example, may read theapplication program such as the compilation program from the removabledisk 290 through the optical drive device 270 and may store theapplication program on the memory 220 or the hard disk device 230. Thecomputer apparatus 200 may download the application program such as thecompilation program through a communication device (not illustrated)that is connected to a network such as the Internet and may store theapplication program on the memory 220 or the hard disk device 230.

The processor 210, for example, realizes the functions of the compilingdevice 10 illustrated in FIG. 1 by executing the compilation programstored in the memory 220 or the like. That is to say, the compilingdevice 10 may be realized by the processor 210, the memory 220, the harddisk device 230, the peripheral interface 240, the output unit 250, andthe input unit 260 in cooperation with each other.

A hardware configuration of the compiling device 10 is not limited tothe example illustrated in FIG. 11. The compiling method and the likeaccording to the present embodiment illustrated in FIG. 1 to FIG. 11,for example, may also be applied to a prefetch instruction for storagein the compilation process for the processor that uses a prefetchinstruction for storage when storing data.

As described above, in the compiling device according to the presentembodiment illustrated in FIG. 1 to FIG. 11, whether to insert aprefetch instruction for target data is determined based on the profileinformation related to a memory access.

The access attribute detection unit 32 detects, from among memoryaccesses for accessing the target data for which a prefetch instructionis to be inserted, a memory access that fits a predetermined accesspattern on the basis of profile information related to a memory accessfor accessing the target data. The predetermined access pattern, forexample, is an access pattern corresponding to an operating condition ofthe hardware prefetch function.

The evaluation value computation unit 34 computes an evaluation valuefor the target data on the basis of the length of successive memoryaccesses that fit the predetermined access pattern. The prefetchinstruction insertion unit 36 determines whether to suppress insertionof a prefetch instruction for target data on the basis of the evaluationvalue and updates the intermediate code on the basis of thedetermination result. The prefetch instruction insertion unit 36, forexample, inserts a prefetch instruction for target data having anevaluation value smaller than a predetermined threshold, suppressesinsertion of a prefetch instruction for target data having an evaluationvalue exceeding the predetermined threshold to update the intermediatecode.

Thus, according to the present embodiment, insertion of an unnecessaryprefetch instruction may be reduced when prefetch instructions areinserted into the machine language program through the compilationprocess. As a result, according to the present embodiment, theperformance degradation of the arithmetic processing device 100 due toinsertion of an unnecessary prefetch instruction may be reduced.

Here, for example, the proportion of data for which the hardwareprefetch is performed is smaller than a predetermined proportion for thetarget data having an evaluation value smaller than a predeterminedthreshold. Therefore, the proportion of data for which both the hardwareprefetch and the prefetch by a prefetch instruction are performed issmaller than the predetermined proportion when a prefetch instruction isinserted. In this case, the merit (for example, shortening the waitingtime for reference of data) of inserting a prefetch instruction isgreater than the demerit of insertion of a redundant prefetchinstruction. For this reason, according to the present embodiment, aprefetch instruction is inserted for the target data having anevaluation value smaller than the threshold. Accordingly, theperformance of the arithmetic processing device 100 may be improved.

For the target data having an evaluation value exceeding thepredetermined threshold, for example, the proportion of data for whichthe hardware prefetch is performed is greater than the predeterminedproportion. Therefore, the proportion of data for which both thehardware prefetch and the prefetch by a prefetch instruction areperformed is greater than the predetermined proportion when a prefetchinstruction is inserted. In this case, the demerit (for example,performance degradation such as a decrease in the transfer speed) ofinserting a redundant prefetch instruction is greater than the merit ofinsertion of a prefetch instruction. For this reason, according to thepresent embodiment, insertion of a prefetch instruction is suppressedfor the target data having an evaluation value exceeding the threshold.Accordingly, the performance degradation of the arithmetic processingdevice 100 due to insertion of an unnecessary prefetch instruction maybe reduced.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the inventionand the concepts contributed by the inventor to furthering the art, andare to be construed as being without limitation to such specificallyrecited examples and conditions, nor does the organization of suchexamples in the specification relate to a showing of the superiority andinferiority of the invention. Although the embodiment of the presentinvention has been described in detail, it should be understood that thevarious changes, substitutions, and alterations could be made heretowithout departing from the spirit and scope of the invention.

What is claimed is:
 1. A compiling method, comprising: converting, by acomputer, a source program into intermediate code; detecting, on basisof profile information related to a memory access for accessing targetdata stored in a memory, a memory access that fits an access patterncorresponding to an operating condition of a hardware prefetch functionfor the target data, the target data being data for which a prefetchinstruction is to be inserted in advance, the prefetch instruction beingan instruction for transferring data stored in the memory to a cachememory; computing an evaluation value for the target data on basis of alength of successive memory accesses that fit the access pattern;determining, on basis of the evaluation value, whether to suppressinsertion of a prefetch instruction for the target data; updating theintermediate code on basis of a result of the determination; andconverting the updated intermediate code into a machine languageprogram.
 2. The compiling method according to claim 1, wherein thecomputer computes the evaluation value on basis of an occurrencefrequency of the successive memory accesses for each length of thesuccessive memory accesses.
 3. The compiling method according to claim1, wherein the memory access that fits the access pattern is such that adestination shift amount is smaller than two lines when a cache line ofthe cache memory is used as a unit of access, the destination shiftamount being an amount of difference between a current accessdestination and a previous access destination.
 4. The compiling methodaccording to claim 1, wherein the memory access that fits the accesspattern is such that an access destination changes in either onedirection of an ascending order and a descending order at a certainaccess width.
 5. The compiling method according to claim 1, wherein theprofile information includes an index and address information, the indexindicating an access order of each piece of data in the target data, theaddress information indicating an access destination, and the computergenerates object code for causing a processor to output the profileinformation for the target data, the processor executing the sourceprogram.
 6. A compiling device, comprising: a first processor configuredto convert a source program into intermediate code, detect, on basis ofprofile information related to a memory access for accessing target datastored in a memory, a memory access that fits an access patterncorresponding to an operating condition of a hardware prefetch functionfor the target data, the target data being data for which a prefetchinstruction is to be inserted in advance, the prefetch instruction beingan instruction for transferring data stored in the memory to a cachememory, compute an evaluation value for the target data on basis of alength of successive memory accesses that fit the access pattern,determine, on basis of the evaluation value, whether to suppressinsertion of a prefetch instruction for the target data, update theintermediate code on basis of a result of the determination, and convertthe updated intermediate code into a machine language program.
 7. Thecompiling device according to claim 6, wherein the first processor isconfigured to compute the evaluation value on basis of an occurrencefrequency of the successive memory accesses for each length of thesuccessive memory accesses.
 8. The compiling device according to claim6, wherein the memory access that fits the access pattern is such that adestination shift amount is smaller than two lines when a cache line ofthe cache memory is used as a unit of access, the destination shiftamount being an amount of difference between a current accessdestination and a previous access destination.
 9. The compiling deviceaccording to claim 6, wherein the memory access that fits the accesspattern is such that an access destination changes in either onedirection of an ascending order and a descending order at a certainaccess width.
 10. The compiling device according to claim 6, wherein theprofile information includes an index and address information, the indexindicating an access order of each piece of data in the target data, theaddress information indicating an access destination, and the firstprocessor generates object code for causing a second processor to outputthe profile information for the target data, the second processorexecuting the source program.
 11. A non-transitory computer-readablerecording medium having stored therein a program for causing a computerto execute a process, the process comprising: converting a sourceprogram into intermediate code; detecting, on basis of profileinformation related to a memory access for accessing target data storedin a memory, a memory access that fits an access pattern correspondingto an operating condition of a hardware prefetch function for the targetdata, the target data being data for which a prefetch instruction is tobe inserted in advance, the prefetch instruction being an instructionfor transferring data stored in the memory to a cache memory; computingan evaluation value for the target data on basis of a length ofsuccessive memory accesses that fit the access pattern; determining, onbasis of the evaluation value, whether to suppress insertion of aprefetch instruction for the target data; updating the intermediate codeon basis of a result of the determination; and converting the updatedintermediate code into a machine language program.
 12. Thenon-transitory computer-readable recording medium according to claim 11,wherein the computer computes the evaluation value on basis of anoccurrence frequency of the successive memory accesses for each lengthof the successive memory accesses.
 13. The non-transitorycomputer-readable recording medium according to claim 11, wherein thememory access that fits the access pattern is such that a destinationshift amount is smaller than two lines when a cache line of the cachememory is used as a unit of access, the destination shift amount beingan amount of difference between a current access destination and aprevious access destination.
 14. The non-transitory computer-readablerecording medium according to claim 11, wherein the memory access thatfits the access pattern is such that an access destination changes ineither one direction of an ascending order and a descending order at acertain access width.
 15. The non-transitory computer-readable recordingmedium according to claim 11, wherein the profile information includesan index and address information, the index indicating an access orderof each piece of data in the target data, the address informationindicating an access destination, and the computer generates object codefor causing a processor to output the profile information for the targetdata, the processor executing the source program.